[MFMA] Switch between MFMA types #352

binarman · 2023-10-09T18:27:57Z

This PR introduces matrix_instr_nonkdim flag to switch
between MFMA 16 and MFMA 32.

binarman · 2023-10-09T18:28:32Z

First need to merge #251

binarman · 2023-10-10T16:41:30Z

lib/Dialect/TritonGPU/Transforms/AccelerateAMDMatmul.cpp

@@ -0,0 +1,251 @@
+#include "mlir/IR/TypeUtilities.h"


@zhanglx13

I've Separated this code from common AccelerateMatmul pass, so I can add an additional option to it.

Do you think it is ok to do this in this PR or is it better to separate it?

zhanglx13 · 2023-10-10T18:49:01Z

python/triton/compiler/compiler.py

@@ -309,14 +310,15 @@ def make_hash(fn, arch, env_vars, **kwargs):
        num_ctas = kwargs.get("num_ctas", 1)
        num_stages = kwargs.get("num_stages", 3)
        waves_per_eu = kwargs.get("waves_per_eu", 0)
+        matrix_instr_nonkdim = kwargs.get("matrix_instr_nonkdim", 0);


@alefimov-amd @oplavsic After dealing with tuning parameters for a while, I'm wondering why we need to add new tuning parameters explicitly, instead of treating them as constants?

The only benefit to add them explicitly is that we can still tune them even they are not explicitly defined as kernel arguments.

Do we have am example of such use?

Maybe I did not understand your idea correctly, I feel that this could be more error prone.

P.s. I also feel that adding tons of parameters is not the best way, and we probably need to find some more elegant way to add them.

When I added pre_load_v as a tuning parameter, I just added it in the config of the autotuner and kernel argument as a tl.constexpr. Nothing is changed in the compiler.py. And it is treated as BLOCK_M instead of num_warps.

It seems that there are two kinds of kernel arguments: one is meta-parameters like BLOCK_M, the other is compilation options like num_warps according to the explanation here: https://github.com/ROCmSoftwarePlatform/triton/blob/461d72e5477d1659dc05e10060db4db3907c958f/python/tutorials/03-matrix-multiplication.py#L162
And the only difference between to two is whether we can set default values to them. For meta-parameters, if nothing is set, there will be an error like missing 1 required positional argument: 'pre_load_v'.

P.s. I think both kinds are compilation options, since the kernel needs to be recompiled if the values is changed.

This is interesting approach, it definitely worth to try it.

I have only one concern about it.
User have to declare this constant by itself, and if he/she make a mistake, this mistake will not be reported.

For example, we use MATRIX_INSTR_NONKDIM constant to control MFMA behavior, user can write this code:

@triton.jit def kernel(MTRIX_INSTR_NONKDIM: tl.constexpr): ... kernel[grid](MTRIX_INSTR_NONKDIM = 16)

This code is correct in therm of a language, but it does not do what we want and it mistake is not reported.

I am thinking, maybe we can introduce some additional decorator to pass AMD specific options to kernel without messing with upstream interfaces.

I see your point.
MATRIX_INSTR_NONKDIM and waves_per_eu are needed explicitly in the lowering passes. However, pre_load_v and BLOCK_M are only needed in the python level frontend, so the compile() function don't care about them.

Upstream has added a lot of hopper specific parameters to the list already, which is far from clean at all.
I agree with you that we should have a "bag" for AMD options. And we should also suggest to upstream to put all these NVIDIA parameters into another bag.

zhanglx13 · 2023-10-17T03:49:38Z

include/triton/Dialect/TritonGPU/Transforms/Passes.td

+    Option<"matrixCoreVersion", "matrix-core-version",
+           "int32_t", /*default*/"0",
+           "device matrix core version">,
+    Option<"matrixInstructionSize", "matrix-instructio-size",


zhanglx13 · 2023-10-17T03:49:51Z

include/triton/Dialect/TritonGPU/Transforms/Passes.td

+           "device matrix core version">,
+    Option<"matrixInstructionSize", "matrix-instructio-size",
+           "int32_t", /*default*/"0",
+           "enforce matrix intrucion MN size">


scxiao · 2023-10-17T19:25:18Z

lib/Conversion/TritonGPUToLLVM/ReduceOpToLLVM.cpp

+        // layout are 32 apart: [[0 0 0 0 32 32 32 32 ...] [1 1 1 1 33 33 33 33
+        // ...] ...]. for mfma 16x16 adjacent threads in y dimension in
+        // transposed MFMA layout are 16 apart: [[0 0 0 0 16 16 16 16 32 32 32
+        // 32 ...] [1 1 1 1 33 33 33 33 ...] ...].


Is it possible to get the waveSize from the gpu dialect or mfma layout?

Unfortunately no...
However! MFMA layout appears in IR only if target is CDNA architecture, which has only 64 waves mode.

I think it should be safe to use constant here.
In my opinion we should report MFMA layout on non CDNA GPU as an error.

If you really want, it is possible to infer waveSize from mfmaLayout by computing a product of mfmaLayout.threadsPerWarp. But that is a little "ugly" in my opinion.

And you can also get it from gpu dialect like here: https://github.com/ROCmSoftwarePlatform/triton/blob/4d539d7dae055bb6b8dbb1b2b380118333250f15/lib/Conversion/TritonGPUToLLVM/ReduceOpToLLVM.cpp#L589

This PR introduces matrix_instr_nonkdim flag to switch between MFMA 16 and MFMA 32.

zhanglx13

LGTM

Some notes:

Documentations about AMD mfma instruction usage is planned in the future PR
[GEMM] [Tuning] Parameterize mfma type #366 is needed for the gemm tuning script to use the correct mfma type

This PR introduces matrix_instr_nonkdim flag to switch between MFMA 16 and MFMA 32.

binarman force-pushed the mfma16_support_kernel_parameter branch from 15c204d to 1b3f6a0 Compare October 10, 2023 16:36

alefimov-amd requested a review from zhanglx13 October 10, 2023 16:38

alefimov-amd marked this pull request as ready for review October 10, 2023 16:38

binarman commented Oct 10, 2023

View reviewed changes

binarman changed the title ~~[WIP] Mfma16 support kernel parameter~~ [MFMA] Switch between MFMA types Oct 10, 2023

zhanglx13 reviewed Oct 10, 2023

View reviewed changes

binarman force-pushed the mfma16_support_kernel_parameter branch from 1b3f6a0 to f43b54e Compare October 12, 2023 18:30

binarman requested a review from zhanglx13 October 16, 2023 15:39

binarman force-pushed the mfma16_support_kernel_parameter branch from f43b54e to fcdb690 Compare October 16, 2023 20:49

zhanglx13 reviewed Oct 17, 2023

View reviewed changes

scxiao reviewed Oct 17, 2023

View reviewed changes

binarman added 3 commits October 17, 2023 20:35

[MFMA] Switch between MFMA types

30c3596

This PR introduces matrix_instr_nonkdim flag to switch between MFMA 16 and MFMA 32.

add license in AccelerateMatmul

3b1c273

review fix

c0a0664

zhanglx13 mentioned this pull request Oct 18, 2023

[GEMM] [Tuning] Parameterize mfma type #366

Merged

zhanglx13 approved these changes Oct 18, 2023

View reviewed changes

binarman force-pushed the mfma16_support_kernel_parameter branch from fcdb690 to c0a0664 Compare October 18, 2023 13:11

alefimov-amd merged commit 20f316b into ROCm:triton-mlir Oct 18, 2023
2 checks passed

scxiao pushed a commit that referenced this pull request Oct 20, 2023

[MFMA] Switch between MFMA types (#352)

3bbec68

This PR introduces matrix_instr_nonkdim flag to switch between MFMA 16 and MFMA 32.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[MFMA] Switch between MFMA types #352

[MFMA] Switch between MFMA types #352

binarman commented Oct 9, 2023 •

edited

Loading

binarman commented Oct 9, 2023

binarman Oct 10, 2023

zhanglx13 Oct 10, 2023

zhanglx13 Oct 10, 2023

binarman Oct 11, 2023

zhanglx13 Oct 11, 2023

binarman Oct 12, 2023

binarman Oct 12, 2023

zhanglx13 Oct 12, 2023

zhanglx13 Oct 12, 2023

zhanglx13 Oct 17, 2023

zhanglx13 Oct 17, 2023

scxiao Oct 17, 2023

binarman Oct 17, 2023

zhanglx13 Oct 18, 2023

zhanglx13 Oct 18, 2023

zhanglx13 left a comment

[MFMA] Switch between MFMA types #352

[MFMA] Switch between MFMA types #352

Conversation

binarman commented Oct 9, 2023 • edited Loading

binarman commented Oct 9, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zhanglx13 left a comment

Choose a reason for hiding this comment

binarman commented Oct 9, 2023 •

edited

Loading